Search CORE

14 research outputs found

WikiM: Metapaths based Wikification of Scientific Abstracts

Author: Goyal Pawan
Jana Abhik
Mooriyath Sruthi
Mukherjee Animesh
Publication venue
Publication date: 09/05/2017
Field of study

In order to disseminate the exponential extent of knowledge being produced in the form of scientific publications, it would be best to design mechanisms that connect it with already existing rich repository of concepts -- the Wikipedia. Not only does it make scientific reading simple and easy (by connecting the involved concepts used in the scientific articles to their Wikipedia explanations) but also improves the overall quality of the article. In this paper, we present a novel metapath based method, WikiM, to efficiently wikify scientific abstracts -- a topic that has been rarely investigated in the literature. One of the prime motivations for this work comes from the observation that, wikified abstracts of scientific documents help a reader to decide better, in comparison to the plain abstracts, whether (s)he would be interested to read the full article. We perform mention extraction mostly through traditional tf-idf measures coupled with a set of smart filters. The entity linking heavily leverages on the rich citation and author publication networks. Our observation is that various metapaths defined over these networks can significantly enhance the overall performance of the system. For mention extraction and entity linking, we outperform most of the competing state-of-the-art techniques by a large margin arriving at precision values of 72.42% and 73.8% respectively over a dataset from the ACL Anthology Network. In order to establish the robustness of our scheme, we wikify three other datasets and get precision values of 63.41%-94.03% and 67.67%-73.29% respectively for the mention extraction and the entity linking phase

arXiv.org e-Print Archive

Crossref

Logarithmic or algebraic: roughening of a generalised Kardar-Parisi-Zhang equation

Author: Basu Abhik
Haldar Astik
Jana Debayan
Publication venue
Publication date: 22/06/2023
Field of study

We show that a nearly phase-ordered two-dimensional (2D) active XY model on a substrate, or a nearly flat active interface that follows a generalised Kardar-Parisi-Zhang (KPZ) equation can be stable in some parameters regimes. In these regimes, the phase fluctuations of the XY model or the interface conformation fluctuations can exhibit a sub-logarithmic or a super-logarithmic roughness. Specifically, an interface of lateral size L, or 2D active XY model on a substrate of linear size L, respectively, will undulate over a typical size or display typical angular fluctuations of size

[\ln(L/a)]^{\mu}

, where

\mu)1

for sub-(super-)logarithmic roughness and a is a microscopic cutoff. This generalise the well-known quasi-long range order of the 2D equilibrium XY model at low temperatures, implying less or more rough than 2D Edward-Wilkinson surfaces. In other parameter regimes, there is only short range phase-order, or an algebraically rough interface.Comment: Preliminary version, 5+4 page

arXiv.org e-Print Archive

Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty

Author: Brasoveanu Adrian
Guo Stephen
Hoffart Johannes
Horne Benjamin D.
Jana Abhik
Mikolov Tomas
Sandhaus Evan
Zheng Zhicheng
Publication venue
Publication date: 13/12/2018
Field of study

Entity Linking (EL) is the task of automatically identifying entity mentions in a piece of text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. There is a large number of EL tools available for different types of documents and domains, yet EL remains a challenging task where the lack of precision on particularly ambiguous mentions often spoils the usefulness of automated disambiguation results in real applications. A priori approximations of the difficulty to link a particular entity mention can facilitate flagging of critical cases as part of semi-automated EL systems, while detecting latent factors that affect the EL performance, like corpus-specific features, can provide insights on how to improve a system based on the special characteristics of the underlying corpus. In this paper, we first introduce a consensus-based method to generate difficulty labels for entity mentions on arbitrary corpora. The difficulty labels are then exploited as training data for a supervised classification task able to predict the EL difficulty of entity mentions using a variety of features. Experiments over a corpus of news articles show that EL difficulty can be estimated with high accuracy, revealing also latent features that affect EL performance. Finally, evaluation results demonstrate the effectiveness of the proposed method to inform semi-automated EL pipelines.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP Symposium On Applied Computing (SAC 2019

arXiv.org e-Print Archive

Crossref